Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana
نویسندگان
چکیده
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM).
منابع مشابه
Array-based Genome Comparison of Arabidopsis Ecotypes using Hidden Markov Models
Abstract: Arabidopsis thaliana is an important model organism in plant biology with a broad geographic distribution including ecotypes from Africa, America, Asia, and Europe. The natural variation of different ecotypes is expected to be reflected to a substantial degree in their genome sequences. Array comparative genomic hybridization (Array-CGH) can be used to quantify the natural variation o...
متن کاملA Microarray Based Genomic Hybridization Method for Identification of New Genes in Plants: Case Analyses of Arabidopsis and Oryza
To systematically estimate the gene duplication events in closely related species, we have to use comparative genomic approaches, either through genomic sequence comparison or comparative genomic hybridization (CGH). Given the scarcity of complete genomic sequences of plant species, in the present study we adopted an array based CGH to investigate gene duplications in the genus Arabidopsis. Fra...
متن کاملHidden Markov models approach to the analysis of array CGH data
The development of solid tumors is associated with acquisition of complex genetic alterations, indicating that failures in the mechanisms that maintain the integrity of the genome contribute to tumor evolution. Thus, one expects that the particular types of genomic alterations seen in tumors reflect underlying failures in maintenance of genetic stability, as well as selection for changes that p...
متن کاملComparative analysis of algorithms for identifying amplifications and deletions in array CGH data
MOTIVATION Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear. RESUL...
متن کاملP-243: Prenatal Diagnosis Using Array CGH: Case Presentation
Background: Karyotype analysis has been the standard and reliable procedure for prenatal cytogenetic diagnosis since the 1970s. However, the major limitation remains requirement for cell culture, resulting in a delay of as much as 14 days to get the test results.CGH array technology has proven to be useful in detecting causative genomic imbalances or genetic mutations in as many as 15% of child...
متن کامل